Crawling Chinese-Myanmar Parallel Corpus: Automatic Collection, Screening and Cleaning Corpus

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic English - Chinese Parallel Corpus Acquisition and Sentences Extraction ⋆

There are lots of valuable resource on Internet which can provide with cross languages and cross areas parallel corpus. Some earlier methods are developed to do this mining work. However, they often use one feature only in the mining process. We use multiple reasonable features of parallel pages to acquire parallel corpus. At last, we also add a SVM classifier which utilize all the features to ...

متن کامل

Automatic Acquisition of Chinese-English Parallel Corpus from the Web

Abstract. Parallel corpora are a valuable resource for tasks such as cross-language information retrieval and data-driven natural language processing systems. Previously only small scale corpora have been available, thus restricting their practical use. This paper describes a system that overcomes this limitation by automatically collecting high quality parallel bilingual corpora from the web. ...

متن کامل

Focused Web Corpus Crawling

In web corpus construction, crawling is a necessary step, and it is probably the most costly of all, because it requires expensive bandwidth usage, and excess crawling increases storage requirements. Excess crawling results from the fact that the web contains a lot of redundant content (duplicates and near-duplicates), as well as other material not suitable or desirable for inclusion in web cor...

متن کامل

Automatic Reordering Rule Generation Based On Parallel Tagged Aligned Corpus for Myanmar-English Machine Translation

Reordering is important problem to be considered when translating between language pairs with different word orders. Myanmar is a verb final language and reordering is needed when it is translated into other languages which are different from Myanmar word order. In this paper, automatic reordering rule generation for Myanmar-English machine machine translation is presented. In order to generate...

متن کامل

3-Step Parallel Corpus Cleaning Using Monolingual Crowd Workers

A high-quality parallel corpus needs to be manually created to achieve good machine translation for the domains which do not have enough existing resources. Although the quality of the corpus to some extent can be improved by asking the professional translators to translate, it is impossible to completely avoid making any mistakes. In this paper, we propose a framework for cleaning the existing...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IOP Conference Series: Materials Science and Engineering

سال: 2019

ISSN: 1757-899X

DOI: 10.1088/1757-899x/646/1/012046